Libris Britannia 4

home *** CD-ROM | disk | FTP | other *** search

/ Libris Britannia 4 / science library(b).zip / science library(b) / BIOLOGY / ESEE109E.ZIP / READ.LWL < prev next >

Wrap

Text File | 1989-10-27 | 7KB | 160 lines

Using LWL85 with Esee via Esee2LWL ================================== The Program LWL85, described by Li, Wu, and Luo, 1985. MBE 2:150-174, gives two (and sometimes) one-parameter estimates of genetic distance for protein coding genes. LWL85 is not included with ESEE, however if you do have it, then the utility program Esee2LWL should simplify the use of Esee save files with LWL85. As far as ease of use goes, LWL85 is on a par with Felsenstein's Phylip programs, except that there are fewer options to worry about, and this program is a bit more finiky about which columns the data are in, as befits a Fortran program. *********************************************************** USING ESEE to make LWL files. (automatic method) - Start ESEE and align your sequences. I recommend deleting any codons that are not found across all of the sequences being considered. - Save the file in an ESEE save file - Leave ESEE - Run the program Esee2lwl.exe that is on this disk (manual method) - Start ESEE and align your sequences. - carefully put in the LENGTH-space-NAME-space-COMMENT fields at the very beginning of each sequence. - insert blanks between those fields and the actual start of the sequence...the sequence should start at POSITION 82. This is crucial. - with the cursor at position 82, depress f1 to get triplet spacing - repeat these steps with each sequence that you wish to output - Go to the print-out window and change the line length to 80. - output all of the seqeunces, one at a time to an ASCII file using ESEE's OUTPUT command. When the prompt for overwrite, append or abort appears, select append. - save your work to a save file if you wish, and exit ESEE ********************************************************************** WHAT DOES Esee2LWL do? In roughly the following order it: -prompts you for a file name then inputs the data, skipping sequences that are type P, T or A and taking only sequences of type N. - Aborts if the number of valid sequences is less than 2 - TRIMs the sequence names to 59 characters - trims the sequence lengths to 2100 characters (if necessary) - checks for sequence length conflicts The sequences should all be of uniform length for LWL. If the (now trimmed) sequences are not of the same length, then the program generates a report of the lengths of the first, shortest and longest sequences. You are then prompted for the sequence length to use. You may specify any integer ranging from 3 to the length of the longest sequence. If there are sequences that are already less than the length that you specify they will be padded with either N's or ?'s (depending on whether you are working with DNA or protein). - checks for name conflicts, you are prompted until all of the names are unique -prompts you for a name for the output file, you have an option to escape if the file already exists -sends data to the output file in the format required by LWL You can use ? for ambiguous bases and *** for ambiguous codons. If you use ? then make sure to specify 001 as the last part of the LWL prompt. For instance to get pairwise distances between 3 taxa that have some ambiguities you would answer LWL's prompt 003003001001 ^ ^ ^ ^ | | | | / | | \ / | | \ / | | \ / / \ \ / / \ \ / / \ \ / / \ \ # sp # pairwise 001 001= throw out condons in file comparisons or 000 with ambiguous, for all (two different comparisons in the run strategies for confusing a.a substitutions) ********************************************************************** I will now attempt to explain the input format of LWL85. Each sequence begins with the length in nucleotides right justified in a field consisting of the first twenty spaces of the first line for that sequence. In plain terms it means that the number expressing the length must end on column 20. Say the number is 109, where the nine is in column 19. The program will interpret this number as 1090! After the length skip a space and put in the name. Then skip another space and put in an optional lable, if you wish. Then comes the sequence itself. Here are the rules: - column1 is empty. - the sequence is presented in triplets, 20 triplets per line - if any of the sequences is missing a residue relative to any of the others, convert that ENTIRE CODON to *** - lines are 80 columns wide - don't include the initiation and termination codons ============================================================================ When you run LWL85 there are series of prompts. The first prompt asks you to type either ZZ3 or ZZZ3. This refers to the codon designations and mutational pathway weights. For rapidly evolving genes it is recommended that you use ZZZ3. Use ZZ3 for the insulin example. Then you are asked for the name of the data file. Next you have to enter the name of an output file. I believe PRN: works for the printer and CON: works for the screen. The next (and final) prompt causes the most problems with users. It is asking for four parameters, each expressed right justified in 3-character wide fields. The first parameter is the number of sequences in the file The second is the number of the sequences to include in the pairwise comparisions. I see no reason no to include all so this should be the same number as the first parameter. The third parameter is ICHECK it deals with a how certain conficts in mutational pathways are dealt with. At present, I'm sure how it affects the result. The fourth parameter is ITOSS. WHEN ITOSS=1 means that if a gap (deletion or undetermined residue) exists in any of the given sequences it is assumed that a gap exists for all sequences at the same site. In the insulin example itoss=1 gives the same result as itoss=0 since the asterisk method is used to handle gaps. Try LWL with the file insulin I suggest using this string for the parameter prompt: 004004000000 {with icheck off} 004004001000 {with icheck on} Notice how you have to be somewhat defensive about the format of these numbers because of the way that Fortran deals with input. Thus 004 means 4, while space-4-space means 40 and 4-space-space means 400. Likewise 1 is expressed as 001. Try to avoid using any spaces at all with this response to the prompt. Eric Cabot, August 1989